IGSR provides open data to support the community’s research efforts. You can see our terms of use in our data disclaimer.
We have developed a data portal to make it easier to find and browse data in IGSR. Let us know what you think at info@1000genomes.org.
1000 Genomes Release | Variants | Individuals | Populations | VCF | Alignments | Supporting Data |
---|---|---|---|---|---|---|
Phase 3 | 84.4 million | 2504 | 26 | VCF | Alignments | Supporting Data |
Phase 1 | 37.9 million | 1092 | 14 | VCF | Alignments | Supporting Data |
Pilot | 14.8 million | 179 | 4 | VCF | Alignments | Supporting Data |
A summary of sequencing done for each of the three pilot projects is available here.
The list of samples collected by the project and what sequence data or other assay data that has been generated for them is available in this spreadsheet.
Our variant calls are always released in VCF format. The released calls from the final phase of the 1000 Genomes Project can be found in the release directory for 2nd May 2013 on the EBI FTP site.
Alignments are available in BAM or CRAM format. Within IGSR, data are grouped in data collections, such as the 1000 Genomes Project or the Illumina Platinum Genomes. A list of the alignment files currently available for a given data collection can be found in the alignment index for that collection on the EBI FTP site. Information on the contents of the index file can be found in the file header.
Sequence data is available from the ENA. A list of files currently available can be found in the sequence.index file for each data collection on the EBI FTP site. These files contain the FTP url for each sequence fastq file, as well as other metadata information about the sequencing run and file. More information on the contents of the index file can be found in the file header.
All the samples studied by the 1000 Genomes Project are available as DNA and cell lines to scientific investigators for research projects. Samples are currently available from the non-profit Coriell Institute for Medical Research. Details of the population collections available from Coriell can be found on the cell lines and DNA page
Data from the 1000 Genomes Project can be viewed in genomic context in genome browsers. Further details about browsing the data in this way can be found here.
The data contained in IGSR can be downloaded from the FTP site hosted at the EBI ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/.
The data can be downloaded via FTP, Aspera and Globus GridFTP. More information about using Aspera or Globus can be found in our FAQ.
How to download files using Aspera
How to download files using Globus
The FTP structure was changed in September 2015. The new structure is described in the FTP site structure README.
During the main 1000 Genomes project, the NCBI acted as a mirror of the EBI hosted 1000 Genomes FTP site and also uploaded alignments and variant calls to an Amazon S3 bucket. This mirroring process stopped in September 2015. The NCBI FTP site and the Amazon S3 bucket still host 1000 Genomes data but no longer mirror new data. Both these locations reflect the structure of the FTP site in August 2015 and hold all the pilot, phase 1 and phase 3 data. NCBI and Amazon do not hold new alignments based on GRCh38, the current reference genome.
NCBI FTP Site : ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp
Amazon S3 : s3://1000genomes
Information on Amazon Web Services can be found on 1000 Genomes public data set page or directly on http://s3.amazonaws.com/1000genomes.